Skip to content

refactor: simplify session lifecycle and zellij runtime#62

Merged
AgentWrapper merged 29 commits into
mainfrom
refactor/remove-canonical-lifecycle
Jun 1, 2026
Merged

refactor: simplify session lifecycle and zellij runtime#62
AgentWrapper merged 29 commits into
mainfrom
refactor/remove-canonical-lifecycle

Conversation

@AgentWrapper

@AgentWrapper AgentWrapper commented May 31, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR simplifies the agent session lifecycle model and removes a large amount of now-unneeded runtime, notification, and storage plumbing.

Lifecycle/session model

  • Removes the canonical lifecycle/decide state machine and associated domain types/tests.
  • Removes notification domain/storage/rendering/dedupe code that depended on that lifecycle model.
  • Stores only durable session facts needed for status derivation: activity_state, activity_source, activity_last_at, and boolean is_terminated.
  • Derives user-facing session status at read time from session facts plus PR facts.
  • Keeps PR-driven agent nudges in lifecycle for CI failures, merge conflicts, and review feedback.

Runtime simplification

  • Removes tmux runtime adapter, tests, and related configurability.
  • Standardizes runtime behavior on Zellij only.
  • Removes per-session runtime_name/runtime configurability from storage and wiring.
  • Narrows session/runtime interfaces so session creation depends only on create/destroy, while liveness remains with reaper/terminal paths.
  • Adds Zellij version validation to ao doctor using the runtime's minimum supported version.

Storage/sqlite cleanup

  • Removes notification tables/queries/store code.
  • Moves SQLite store implementation files into backend/internal/storage/sqlite/store.
  • Regenerates sqlc with stronger Go types for IDs, enums, booleans, and PR/session fields.
  • Adds npm run sqlc as the explicit sqlc generation workflow.
  • Keeps generated storage rows internal to sqlite and uses domain types for app-level PR/session models.

Ports/domain/API cleanup

  • Moves PR persistence models into domain and keeps ports focused on boundary contracts/DTOs.
  • Removes unused broad port interfaces and narrows package-local interfaces where possible.
  • Splits mixed observation DTOs into PR-specific and runtime/activity-specific files.
  • Cleans stale docs/comments around lifecycle, daemon, terminal, and status behavior.
  • Renames HTTP terminal mux implementation files/symbols for clarity while preserving the /mux route.
  • Makes HTTP/terminal logger dependencies tolerate nil by falling back to slog.Default().

CLI and docs

  • Updates docs to reflect Zellij-only runtime support and derived session status.
  • Makes CLI commands reject unexpected positional arguments consistently.
  • Keeps ao doctor from opening/migrating SQLite; daemon remains the only DB migrator/writer.

Testing

  • npm run sqlc
  • npm run lint
  • cd backend && go test -race ./...

@AgentWrapper AgentWrapper force-pushed the refactor/remove-canonical-lifecycle branch 6 times, most recently from 1940c69 to 7e18aae Compare May 31, 2026 22:59
@greptile-apps

greptile-apps Bot commented May 31, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR is a substantial simplification of the session lifecycle model: the canonical CanonicalSessionLifecycle state machine (with its multi-field session state, termination reason, detecting quarantine, and is_alive flag) is replaced by two persisted facts — activity_state and is_terminated. User-facing status is derived at read time from those facts plus PR data in the pr table. The notifications subsystem (domain, storage, schema, sqlc, wiring) and the tmux runtime adapter are removed entirely, standardising on Zellij only.

  • The lifecycle.Manager is slimmed to two write-paths (MarkSpawned, MarkTerminated) plus observation appliers; the pr.Manager is a new package that persists PR observations and forwards merge/nudge events to the LCM; the reaper loses its RuntimeRegistry and uses a single injected ports.Runtime.
  • Migration 0004_simplify_session_state.sql drops session_state, termination_reason, is_alive, detecting columns, and runtime_name, adds is_terminated, and rewrites the session CDC triggers; the notifications migration 0002 is deleted and the notifications table is dropped in the 0004 Up path.

Confidence Score: 5/5

Safe to merge; the simplification is well-executed and the migration correctly handles both fresh and existing databases with IF EXISTS guards throughout.

The lifecycle model refactor correctly replaces the multi-field state machine with two persisted facts (activity_state + is_terminated) and derives display status at read time. The migration Up/Down paths are symmetric and cover both upgrade and fresh-DB cases. The anti-flap mechanism (ProbeDead+ProbeDead required, recent-activity window) is simpler than the old detecting quarantine but still protects against transient false positives. No logic errors or data-corruption paths were found in the changed code.

reactions.go — the sendOnce dedup state is not session-scoped and never pruned; low practical risk given PR URL uniqueness, but worth addressing before this path sees heavy use.

Important Files Changed

Filename Overview
backend/internal/lifecycle/reactions.go Reduced to agent-only nudges via sendOnce; dedup state (seen/attempts maps) is not session-scoped and is never pruned on session termination or restart
backend/internal/lifecycle/manager.go Simplified to MarkSpawned/MarkTerminated plus ApplyActivitySignal/ApplyRuntimeObservation; mutate pipeline correctly diffs before writing
backend/internal/storage/sqlite/migrations/0004_simplify_session_state.sql Correctly drops old session columns/notifications and adds is_terminated; IF EXISTS guards cover both fresh-DB and upgrade paths; Down section restores pre-migration schema with best-effort state mapping
backend/internal/pr/manager.go New package cleanly separates PR persistence from lifecycle; correctly sequences write to terminate/nudge; nil guards on sessions and lifecycle fields are test-only affordances
backend/internal/domain/status.go DeriveStatus simplified to is_terminated+PR branch then activity-state fallthrough; removes StatusSpawning/StatusDetecting/StatusErrored/StatusKilled/StatusDone/StatusCleanup
backend/internal/observe/reaper/reaper.go Reaper simplified: RuntimeRegistry removed, single ports.Runtime injected; probeOne builds facts correctly; no longer calls TickEscalations
backend/internal/storage/sqlite/store/session_store.go New session store layer with clean domain-to-gen mapping; normalActivity helper ensures default state/source/timestamp are always set before writes
backend/internal/storage/sqlite/store/pr_facts.go ListPRFactsForSession issues one ListPRComments query per PR, an N+1 pattern that is benign for typical single-PR sessions but could scale poorly
backend/internal/lifecycle/decide_bridge.go runtimeClearlyDead replaces the full probe decider; correctly requires both Runtime and Process == ProbeDead plus no recent activity before marking dead
backend/internal/session/manager.go Kill/Restore/Cleanup updated to use IsTerminated boolean; displayPR helper correctly prefers an open PR over merged/closed ones when deriving display status

Sequence Diagram

sequenceDiagram
    participant GH as GitHub Tracker
    participant PRM as pr.Manager
    participant LCM as lifecycle.Manager
    participant Store as SQLite Store
    participant Reaper as Reaper

    Note over GH,Reaper: PR observation flow (new model)
    GH->>PRM: ApplyObservation(id, obs)
    PRM->>Store: WritePR(row, checks, comments)
    alt "obs.Merged == true"
        PRM->>LCM: MarkTerminated(id)
        LCM->>Store: "UpdateSession is_terminated=true activity=exited"
    else obs open or CI failing or review
        PRM->>LCM: ApplyPRObservation(id, obs)
        LCM->>Store: GetSession(id)
        LCM-->>LCM: sendOnce(key, sig, msg)
        LCM->>LCM: messenger.Send(id, msg)
    end

    Note over Reaper,LCM: Runtime liveness probe (new model)
    Reaper->>Store: ListAllSessions()
    loop each non-terminated session
        Reaper->>Reaper: zellij.IsAlive(handle)
        alt ProbeDead and no recent activity
            Reaper->>LCM: ApplyRuntimeObservation(id, facts)
            LCM->>Store: "UpdateSession is_terminated=true activity=exited"
        end
    end

    Note over LCM,Store: Status derivation on read
    Store-->>LCM: SessionRecord plus PRFacts list
    LCM-->>LCM: displayPR selects best PRFacts
    LCM-->>LCM: DeriveStatus returns SessionStatus
Loading

Reviews (6): Last reviewed commit: "refactor: align storage and runtime obse..." | Re-trigger Greptile

Comment thread backend/internal/lifecycle/manager.go Outdated
@AgentWrapper AgentWrapper force-pushed the refactor/remove-canonical-lifecycle branch from 7e18aae to 08c8adb Compare May 31, 2026 23:30
@AgentWrapper AgentWrapper force-pushed the refactor/remove-canonical-lifecycle branch from 08c8adb to 77456b0 Compare May 31, 2026 23:33
Comment thread backend/internal/lifecycle/reactions.go Outdated
@AgentWrapper AgentWrapper changed the title refactor: remove canonical lifecycle state refactor: simplify session lifecycle and zellij runtime Jun 1, 2026
@AgentWrapper AgentWrapper merged commit a34094e into main Jun 1, 2026
7 checks passed
AgentWrapper added a commit that referenced this pull request Jun 1, 2026
Co-authored-by: itrytoohard <ayetrytoohard@gmail.com>
harshitsinghbhandari added a commit that referenced this pull request Jun 1, 2026
)

* feat(scm): GitHub provider adapter — Observe(prURL) → PRObservation

A fresh GitHub SCM provider adapter under
backend/internal/adapters/scm/github/ exposing one method:

  (*Provider).Observe(ctx, prURL) (ports.PRObservation, error)

It performs a REST GET on /repos/{o}/{r}/pulls/{n} for the authoritative
draft/merged/closed/head-SHA, one GraphQL query for the reviewDecision +
mergeStateStatus + statusCheckRollup + unresolved review threads, and
(only for failure-class CheckRuns) a REST GET on
/actions/jobs/{job_id}/logs to splice the last 20 lines of the failed
job into the observation.

The package is the observation primitive; the polling loop, cadence
selection, daemon wiring, persistence and webhook receiver are all
intentionally out of scope (separate PRs / lanes).

Closes #27 — this supersedes PR #28's attempt, which targeted types
(domain.SCMProvider / SCMSnapshot / ports.SCMObserveRequest) that the
PR #62 simplification refactor has since removed. The GraphQL queries
and mergeability composition logic are credited to @whoisasx from
PR #28's provider.go; the package was re-implemented against the
current ports.PRObservation seam (post-#62) rather than rebased.

Bot-author detection uses ONLY GitHub's typed signal (__typename
"Bot" / User.Type "Bot"). The strings.Contains(login, "bot") fallback
from PR #28 was intentionally dropped — aa-18's review flagged it as
a false-positive magnet for logins like "robothon" / "lambot123".

46 table-driven tests against httptest.NewServer cover happy path,
draft, merged, closed (not merged), CI passing/failing/pending,
StatusContext legacy, log-tail extraction (and the best-effort
log-fetch failure case), mergeability mergeable/conflicting/blocked
(including ci-failing → blocked even when GitHub still says CLEAN —
the load-bearing aa-18 contract)/unstable/unknown, review
approved/changes-requested/required/none, bot-author filtering
(including the robothon false-positive guard), unresolved-only
threads, all-bots → empty Comments, ETag-304 cache hit, primary +
secondary rate-limit (with errors.As → *RateLimitError), 401 →
ErrAuthFailed, malformed JSON → Fetched:false, network error →
Fetched:false, Authorization Bearer header injection,
StaticTokenSource blank/whitespace rejection, GHTokenSource memoize
+ invalidate.

Verification:
- go build ./...               clean
- go vet ./...                 clean
- gofmt -l backend/internal/adapters/scm/   clean
- golangci-lint run ./... (v2.12, repo .golangci.yml)   0 issues
- go test -race ./internal/adapters/scm/github/...      46/46 PASS

References:
- aa-18 review of PR #28: ~/.ao/agent-reports/aa-18.md
- aa-26 tracker adapter (sibling Go-adapter pattern): #36 / agent-reports/aa-26.md

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(scm): address greptile review on #69

Four fixes from the greptile review of PR #69:

1. CI rollup pagination (P1) — when GraphQL reports
   pageInfo.hasNextPage=true for the statusCheckRollup contexts, a
   visible "all passing" set could be hiding a failing context on the
   next page. ciSummaryFromGraphQL now degrades Passing / Pending /
   Unknown to CIUnknown in that case; a known CIFailing on the visible
   page is still safe and is NOT degraded. Also bumped the per-page
   limit from 50 to 100 (GraphQL's documented max for the contexts
   connection). Two new tests pin both branches.

2. Empty GraphQL inline fragment (P2) — dropped
   `... on User { }` from the reviewThreads author selection. The
   empty selection set was technically invalid GraphQL and a future
   API tightening could reject the query. __typename already tells us
   whether the actor is a Bot, so the fragment carried no information.

3. rest.MergeStateStatus dead-code (P2) — the field decoded from the
   non-existent REST `merge_state_status` was always empty, making the
   firstNonEmpty fallback dead code. Removed the field and switched
   the tiebreaker to rest.MergeableState (the actual REST field, upper-
   cased so the same switch covers both GraphQL and REST shapes).

4. Wrong Accept header on /actions/jobs/{id}/logs (P2) — GitHub's
   REST API validates the Accept header before issuing the 302 to the
   log blob; sending text/plain risks a 406. Switched to the canonical
   application/vnd.github+json; the redirected blob serves text/plain
   regardless.

Verification:
- go build ./...               clean
- go vet ./...                 clean
- golangci-lint run ./...      0 issues
- go test -race ./internal/adapters/scm/github/...   48 / 48 PASS

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
AgentWrapper pushed a commit that referenced this pull request Jun 1, 2026
…pec API, regen)

Rebased onto main, which landed several changes that intersect this branch:

- writeProjectError is now 3-arg (status derived from project.Error.Kind); the
  get handler's degenerate-GetResult guard now calls the 3-arg form.
- apispec.Spec dropped its YAML() method; add apispec.Embedded() returning the
  raw embedded bytes, used by the generator's drift + route-parity tests.
- #62 removed Runtime/Reactions/ReactionConfig from the project types, so the
  regenerated openapi.yaml (and frontend schema.d.ts) drop those, and the dead
  ReactionConfig entry is removed from the schema-name map.
- CI go.yml: keep main's new golangci-lint `lint` job alongside the gen-verify
  job, both reading the Go version from go.mod.

go build/vet/test all pass; spec + TS regenerate with no drift; gofmt clean.
AgentWrapper added a commit that referenced this pull request Jun 1, 2026
Co-authored-by: itrytoohard <ayetrytoohard@gmail.com>
yyovil added a commit that referenced this pull request Jun 2, 2026
Re-integrate the agent-adapter layer onto main's rewritten session lane.

main (#62/#66/#67/#68/#69) moved the session manager into session_manager
plus a service layer, reworked the domain (Activity, IsTerminated), and
removed the tmux runtime. This merge:

- Keeps the rich, hooks-capable ports.Agent (in internal/ports) as the
  canonical agent contract; session_manager.Manager now resolves a real
  adapter per session via ports.AgentResolver (from cfg.Harness on Spawn,
  the stored harness on Restore) and builds RuntimeConfig.Argv.
- Drops the tmux runtime; standardizes on zellij.
- Re-adds the daemon's per-session agent resolver wiring
  (buildAgentResolver over the claude-code + codex registry, AO_AGENT
  default), ready to plug into the httpd session-route slot.

go build, go vet, and go test -race are green; the zellij/tmux real
integration tests are environment-gated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants